{
    "cells": [
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "f1a9eb90-335c-4214-8bb6-fd1edbe3ccbd",
            "metadata": {},
            "outputs": [],
            "source": [
                "# My OpenAI Key\n",
                "import os\n",
                "os.environ['OPENAI_API_KEY'] = \"INSERT OPENAI KEY\""
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "6a712b56",
            "metadata": {},
            "outputs": [],
            "source": [
                "import logging\n",
                "import sys\n",
                "\n",
                "logging.basicConfig(stream=sys.stdout, level=logging.INFO)\n",
                "logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "be3f7baa-1c0a-430b-981b-83ddca9e71f2",
            "metadata": {
                "tags": []
            },
            "source": [
                "## Using GPT Tree Index"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "0881f151-279e-4910-95c7-f49d3d6a4c69",
            "metadata": {},
            "source": [
                "#### [Demo] Default leaf traversal "
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "8d0b2364-4806-4656-81e7-3f6e4b910b5b",
            "metadata": {},
            "outputs": [],
            "source": [
                "from llama_index import GPTTreeIndex, SimpleDirectoryReader\n",
                "from IPython.display import Markdown, display"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 2,
            "id": "1c297fd3-3424-41d8-9d0d-25fe6310ab62",
            "metadata": {
                "tags": []
            },
            "outputs": [],
            "source": [
                "documents = SimpleDirectoryReader('data').load_data()"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "370fd08f-56ff-4c24-b0c4-c93116a6d482",
            "metadata": {
                "tags": []
            },
            "outputs": [],
            "source": [
                "index = GPTTreeIndex.from_documents(documents)"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "0b4fe9b6-5762-4e86-b51e-aac45d3ecdb1",
            "metadata": {},
            "outputs": [],
            "source": [
                "index.save_to_disk('index.json')"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 2,
            "id": "5eec265d-211b-4f26-b05b-5b4e7072bc6e",
            "metadata": {},
            "outputs": [],
            "source": [
                "# try loading\n",
                "new_index = GPTTreeIndex.load_from_disk('index.json')"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "bd14686d-1c53-4637-9340-3745f2121ae2",
            "metadata": {},
            "outputs": [],
            "source": [
                "# set Logging to DEBUG for more detailed outputs\n",
                "response = new_index.query(\"What did the author do growing up?\")"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 4,
            "id": "b4c87d14-d2d8-4d80-89f6-1e5972973528",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/markdown": [
                            "<b>The author wrote short stories and tried to program on an IBM 1401.</b>"
                        ],
                        "text/plain": [
                            "<IPython.core.display.Markdown object>"
                        ]
                    },
                    "metadata": {},
                    "output_type": "display_data"
                }
            ],
            "source": [
                "display(Markdown(f\"<b>{response}</b>\"))"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "68c9ebfe-b1b6-4f4e-9278-174346de8c90",
            "metadata": {
                "tags": []
            },
            "outputs": [],
            "source": [
                "# set Logging to DEBUG for more detailed outputs\n",
                "response = new_index.query(\"What did the author do after his time at Y Combinator?\")"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 6,
            "id": "a5ab5943-7c84-4c2b-ac99-ec4b5fc67e64",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/markdown": [
                            "<b>The author went on to start his own company.</b>"
                        ],
                        "text/plain": [
                            "<IPython.core.display.Markdown object>"
                        ]
                    },
                    "metadata": {},
                    "output_type": "display_data"
                }
            ],
            "source": [
                "display(Markdown(f\"<b>{response}</b>\"))"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "85c62ec3-c3cf-467e-ab0f-88ffb9f990be",
            "metadata": {},
            "source": [
                "#### [Demo] Leaf traversal with child_branch_factor=2"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "46714db4-9592-4c55-9ca7-916758f2ce68",
            "metadata": {
                "tags": []
            },
            "outputs": [],
            "source": [
                "# try using branching factor 2\n",
                "response = new_index.query(\"What did the author do growing up?\", child_branch_factor=2)"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 6,
            "id": "1ea7f891-b7e1-497a-a965-14201b220404",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/markdown": [
                            "<b>The author grew up writing simple programs on a TRS-80 computer, as well as trying to program on an IBM 1401. In the early 1990s, the author was a student at the Rhode Island School of Design (RISD) and then the Accademia di Belle Arti in Florence, Italy. They eventually dropped out of RISD and moved to New York City, where they got a job at Interleaf, a software company. While working there, they learned about a new markup language called HTML, which would later become a big part of their life. He also wrote a book on Lisp programming.</b>"
                        ],
                        "text/plain": [
                            "<IPython.core.display.Markdown object>"
                        ]
                    },
                    "metadata": {},
                    "output_type": "display_data"
                }
            ],
            "source": [
                "display(Markdown(f\"<b>{response}</b>\"))"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "3c572726-bb95-49c3-a762-d966de59ee5f",
            "metadata": {},
            "source": [
                "#### [Demo] Build Tree Index during Query-Time"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 3,
            "id": "255fb052-1ff6-4f27-881f-28d4790e9520",
            "metadata": {},
            "outputs": [],
            "source": [
                "documents = SimpleDirectoryReader('data').load_data()"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 4,
            "id": "85371256-292c-473e-9485-7de5c1997a59",
            "metadata": {},
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "> [build_index_from_documents] Total token usage: 0 tokens\n"
                    ]
                }
            ],
            "source": [
                "index_light = GPTTreeIndex.from_documents(documents, build_tree=False)"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 7,
            "id": "77b0acb3-5593-4f00-8eef-315a031fedc2",
            "metadata": {},
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "> Starting query: What did the author do after his time at Y Combinator?\n",
                        "> Building index from nodes: 5 chunks\n",
                        "0/57\n",
                        "10/57\n",
                        "20/57\n",
                        "30/57\n",
                        "40/57\n",
                        "50/57\n",
                        "> [query] Total token usage: 18200 tokens\n"
                    ]
                },
                {
                    "data": {
                        "text/plain": [
                            "'\\nThe author went back to painting.'"
                        ]
                    },
                    "execution_count": 7,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "index_light.query(\"What did the author do after his time at Y Combinator?\", mode=\"summarize\")"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "f9773497-9aa6-4a16-884a-cd882e63d012",
            "metadata": {},
            "source": [
                "#### [Demo] Build Tree Index with a custom Summary Prompt, directly retrieve answer from root node"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 20,
            "id": "8ab6d3ad-95e1-477a-a0dc-2ce4763ff2c4",
            "metadata": {},
            "outputs": [],
            "source": [
                "from llama_index import SummaryPrompt"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 23,
            "id": "5a91a445-6ab2-457c-850e-79c5386129db",
            "metadata": {},
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "> Building index from nodes: 5 chunks\n",
                        "0/57\n",
                        "10/57\n",
                        "20/57\n",
                        "30/57\n",
                        "40/57\n",
                        "50/57\n",
                        "> [build_index_from_documents] Total token usage: 18031 tokens\n"
                    ]
                }
            ],
            "source": [
                "documents = SimpleDirectoryReader('data').load_data()\n",
                "\n",
                "query_str = \"What did the author do growing up?\"\n",
                "SUMMARY_PROMPT_TMPL = (\n",
                "    \"Context information is below. \\n\"\n",
                "    \"---------------------\\n\"\n",
                "    \"{context_str}\"\n",
                "    \"\\n---------------------\\n\"\n",
                "    \"Given the context information and not prior knowledge, \"\n",
                "    f\"answer the question: {query_str}\\n\"\n",
                ")\n",
                "SUMMARY_PROMPT = SummaryPrompt(SUMMARY_PROMPT_TMPL)\n",
                "index_with_query = GPTTreeIndex.from_documents(documents, summary_template=SUMMARY_PROMPT)"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 6,
            "id": "985dad0c-1ede-4576-a4c9-c077b815edd8",
            "metadata": {},
            "outputs": [],
            "source": [
                "index_with_query.save_to_disk(\"index_with_query.json\")"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 7,
            "id": "de04fce5-88f9-41b7-87d9-dcde8f84a872",
            "metadata": {},
            "outputs": [],
            "source": [
                "index_with_query = GPTTreeIndex.load_from_disk(\"index_with_query.json\")"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "9223ffa8-d49d-4de3-821a-701b2a0352d4",
            "metadata": {},
            "outputs": [],
            "source": [
                "# directly retrieve response from root nodes instead of traversing tree\n",
                "response = index_with_query.query(query_str, mode=\"retrieve\")"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 10,
            "id": "fdca6970-2f3f-4741-ae98-555db8d3d9a0",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/markdown": [
                            "<b>\n",
                            "The author was homeschooled and then attended a prestigious art school. The author grew up writing essays and thinking about other things he could work on.</b>"
                        ],
                        "text/plain": [
                            "<IPython.core.display.Markdown object>"
                        ]
                    },
                    "metadata": {},
                    "output_type": "display_data"
                }
            ],
            "source": [
                "display(Markdown(f\"<b>{response}</b>\"))"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "a6457769-dfaf-4241-ab32-dcf901dde902",
            "metadata": {},
            "source": [
                "## Using GPT Keyword Table Index"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 7,
            "id": "78d59ef6-70b0-47bb-818d-7237a3b7de75",
            "metadata": {},
            "outputs": [],
            "source": [
                "from llama_index import GPTKeywordTableIndex, SimpleDirectoryReader\n",
                "from IPython.display import Markdown, display"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 2,
            "id": "5a3f1c67-6d73-4f37-afcf-9e637002fcff",
            "metadata": {},
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "> Processing chunk 0 of 6: \t\t\n",
                        "\n",
                        "What I Worked On\n",
                        "\n",
                        "February 2021\n",
                        "\n",
                        "Before col...\n",
                        "> Keywords: ['painting', 'computers', 'programming', 'lisp', 'ai', 'college', 'graduate school', 'graduate', 'school', 'writing']\n",
                        "> Processing chunk 1 of 6: of excluding them, because there were so many s...\n",
                        "> Keywords: ['school', 'students', 'painting', 'florence', 'risd', 'accademia', 'still lives', 'still', 'lives', 'color', 'new york', 'new', 'york', 'yorkville', 'idelle weber', 'idelle', 'weber', 'harvard', 'world wide web', 'world', 'wide', 'web', 'y combinator', 'combinator', 'software', 'lisp']\n",
                        "> Processing chunk 2 of 6: an alarming prospect, because neither of us kne...\n",
                        "> Keywords: ['windows', 'unix', 'lisp', 'web app', 'web', 'app', 'browser', 'store builder', 'store', 'builder', 'ecommerce', 'startup', 'painting']\n",
                        "> Processing chunk 3 of 6: browser, and then host the resulting applicatio...\n",
                        "> Keywords: ['y combinator', 'combinator', 'investment', 'summer founders program', 'summer', 'founders', 'program', 'microsoft', 'goldman sachs', 'goldman', 'sachs']\n",
                        "> Processing chunk 4 of 6: person, and from those we picked 8 to fund. The...\n",
                        "> Keywords: ['y combinator', 'combinator', 'yc', 'lisp', 'bel', 'essays', 'writing', 'software', 'programming', 'arc']\n",
                        "> Processing chunk 5 of 6: it was like living in another country, and sinc...\n",
                        "> Keywords: ['software', 'technology', 'y combinator', 'combinator', 'essays', 'online publishing', 'online', 'publishing', 'venture capital', 'venture', 'capital', 'startups', 'space aliens', 'space', 'aliens', 'lisp']\n"
                    ]
                }
            ],
            "source": [
                "# build keyword index\n",
                "documents = SimpleDirectoryReader('data').load_data()\n",
                "index = GPTKeywordTableIndex.from_documents(documents)"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 4,
            "id": "7ec97988-0190-4df7-b19a-e3130122298f",
            "metadata": {},
            "outputs": [],
            "source": [
                "# save index\n",
                "index.save_to_disk('index_table.json')"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 8,
            "id": "d94d0fe0-43c1-41cd-901b-0d748d30f1c7",
            "metadata": {},
            "outputs": [],
            "source": [
                "# reload index\n",
                "index = GPTKeywordTableIndex.load_from_disk('index_table.json')"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 9,
            "id": "69d4f686-6825-49cf-a113-d2fdd484de77",
            "metadata": {},
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "> Starting query: What did the author do after his time at Y Combinator?\n",
                        "Extracted keywords: ['y combinator', 'combinator']\n",
                        "> Querying with idx: 7143669651211954504: of excluding them, because there were so many s...\n",
                        "> Querying with idx: 4978118451876167434: browser, and then host the resulting applicatio...\n",
                        "> Querying with idx: 7378313280237489139: person, and from those we picked 8 to fund. The...\n",
                        "> Querying with idx: 2670584622494666310: it was like living in another country, and sinc...\n"
                    ]
                }
            ],
            "source": [
                "# set Logging to DEBUG for more detailed outputs\n",
                "response = index.query(\"What did the author do after his time at Y Combinator?\")"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 10,
            "id": "a483514d-4ab5-489d-8b99-7250df491ce3",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/markdown": [
                            "<b>\n",
                            "\n",
                            "After a few years, the author decided to step away from Y Combinator to focus on other projects, such as painting and writing essays. In 2013, he handed over control of Y Combinator to Sam Altman. The author's mother passed away in 2014, and after taking some time to grieve, he returned to writing essays and working on Lisp. He continued working on Lisp until 2019, when he finally completed the project.\n",
                            "\n",
                            "In 2015, the author decided to move to England with his family. They originally intended to only stay for a year, but ended up liking it so much that they remained there. The author wrote Bel while living in England. In 2019, he finally finished the project. After completing Bel, the author wrote a number of essays on various topics. He continued writing essays through 2020, but also started thinking about other things he could work on.</b>"
                        ],
                        "text/plain": [
                            "<IPython.core.display.Markdown object>"
                        ]
                    },
                    "metadata": {},
                    "output_type": "display_data"
                }
            ],
            "source": [
                "display(Markdown(f\"<b>{response}</b>\"))"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "aae1bea9-b534-430a-a52b-1f4414957ac9",
            "metadata": {},
            "source": [
                "## Using GPT List Index"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "1aa8c8c1-7fce-4737-9141-d14fd37a779c",
            "metadata": {},
            "outputs": [],
            "source": [
                "from llama_index import GPTListIndex, SimpleDirectoryReader\n",
                "from IPython.display import Markdown, display"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 2,
            "id": "191caa65-a77f-4d8c-b095-4aed61300ea5",
            "metadata": {},
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "> Adding chunk: \t\t\n",
                        "\n",
                        "What I Worked On\n",
                        "\n",
                        "February 2021\n",
                        "\n",
                        "Before col...\n",
                        "> Adding chunk: only up to age 25 and already there are such co...\n",
                        "> Adding chunk: clear that it was even possible. To find out, w...\n",
                        "> Adding chunk: a name for the kind of company Viaweb was, an \"...\n",
                        "> Adding chunk: get their initial set of customers almost entir...\n",
                        "> Adding chunk: had smart people and built impressive technolog...\n",
                        "> [build_index_from_documents] Total token usage: 0 tokens\n"
                    ]
                }
            ],
            "source": [
                "# build linked list index\n",
                "documents = SimpleDirectoryReader('data').load_data()\n",
                "index = GPTListIndex.from_documents(documents)\n",
                "# save index\n",
                "index.save_to_disk('index_list.json')"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 14,
            "id": "af2d049d-518d-4ec4-b84f-1fab8aece04f",
            "metadata": {},
            "outputs": [],
            "source": [
                "# load index from disk\n",
                "index = GPTListIndex.load_from_disk('index_list.json')"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "1b3d4bd8-7540-4c6f-8616-ab2d8c6ae2b2",
            "metadata": {},
            "outputs": [],
            "source": [
                "# set Logging to DEBUG for more detailed outputs\n",
                "response = index.query(\"What did the author do after his time at Y Combinator?\")"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 16,
            "id": "5101b979-175f-490e-9b32-27689fe4b789",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/markdown": [
                            "<b>\n",
                            "\n",
                            "After his time at Y Combinator, the author moved back to Providence to continue at RISD. However, he found that art school was not what he expected it to be and dropped out. He then moved to New York City and started writing a book on Lisp. When that didn't work out, he started a company to put art galleries online. However, that also failed. He then had the idea to start a company to build online stores, which became a success.\n",
                            "\n",
                            "The author then left his position at Yahoo to pursue painting full-time. However, he found it difficult to get back into the painting mindset and eventually returned to New York City. It was there that he had the idea to create a web application that would allow users to create and host their own web applications.</b>"
                        ],
                        "text/plain": [
                            "<IPython.core.display.Markdown object>"
                        ]
                    },
                    "metadata": {},
                    "output_type": "display_data"
                }
            ],
            "source": [
                "display(Markdown(f\"<b>{response}</b>\"))"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "65cfce56-853e-431b-888e-946771c3b07e",
            "metadata": {},
            "outputs": [],
            "source": []
        }
    ],
    "metadata": {
        "kernelspec": {
            "display_name": "Python 3 (ipykernel)",
            "language": "python",
            "name": "python3"
        },
        "language_info": {
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "file_extension": ".py",
            "mimetype": "text/x-python",
            "name": "python",
            "nbconvert_exporter": "python",
            "pygments_lexer": "ipython3",
            "version": "3.9.16"
        }
    },
    "nbformat": 4,
    "nbformat_minor": 5
}
